692 research outputs found
Practical Bayesian Optimization of Machine Learning Algorithms
Machine learning algorithms frequently require careful tuning of model
hyperparameters, regularization terms, and optimization parameters.
Unfortunately, this tuning is often a "black art" that requires expert
experience, unwritten rules of thumb, or sometimes brute-force search. Much
more appealing is the idea of developing automatic approaches which can
optimize the performance of a given learning algorithm to the task at hand. In
this work, we consider the automatic tuning problem within the framework of
Bayesian optimization, in which a learning algorithm's generalization
performance is modeled as a sample from a Gaussian process (GP). The tractable
posterior distribution induced by the GP leads to efficient use of the
information gathered by previous experiments, enabling optimal choices about
what parameters to try next. Here we show how the effects of the Gaussian
process prior and the associated inference procedure can have a large impact on
the success or failure of Bayesian optimization. We show that thoughtful
choices can lead to results that exceed expert-level performance in tuning
machine learning algorithms. We also describe new algorithms that take into
account the variable cost (duration) of learning experiments and that can
leverage the presence of multiple cores for parallel experimentation. We show
that these proposed algorithms improve on previous automatic procedures and can
reach or surpass human expert-level optimization on a diverse set of
contemporary algorithms including latent Dirichlet allocation, structured SVMs
and convolutional neural networks
Bayesian Optimization with Unknown Constraints
Recent work on Bayesian optimization has shown its effectiveness in global
optimization of difficult black-box objective functions. Many real-world
optimization problems of interest also have constraints which are unknown a
priori. In this paper, we study Bayesian optimization for constrained problems
in the general case that noise may be present in the constraint functions, and
the objective and constraints may be evaluated independently. We provide
motivating practical examples, and present a general framework to solve such
problems. We demonstrate the effectiveness of our approach on optimizing the
performance of online latent Dirichlet allocation subject to topic sparsity
constraints, tuning a neural network given test-time memory constraints, and
optimizing Hamiltonian Monte Carlo to achieve maximal effectiveness in a fixed
time, subject to passing standard convergence diagnostics.Comment: 14 pages, 3 figure
Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces
In practical Bayesian optimization, we must often search over structures with
differing numbers of parameters. For instance, we may wish to search over
neural network architectures with an unknown number of layers. To relate
performance data gathered for different architectures, we define a new kernel
for conditional parameter spaces that explicitly includes information about
which parameters are relevant in a given structure. We show that this kernel
improves model quality and Bayesian optimization results over several simpler
baseline kernels.Comment: 6 pages, 3 figures. Appeared in the NIPS 2013 workshop on Bayesian
optimizatio
Recommended from our members
Practical Bayesian Optimization of Machine Learning Algorithms
Machine learning algorithms frequently require careful tuning of model hyperparameters, regularization terms, and optimization parameters. Unfortunately, this tuning is often a "black art" that requires expert experience, unwritten rules of thumb, or sometimes brute-force search. Much more appealing is the idea of developing automatic approaches which can optimize the performance of a given learning algorithm to the task at hand. In this work, we consider the automatic tuning problem within the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). The tractable posterior distribution induced by the GP leads to efficient use of the information gathered by previous experiments, enabling optimal choices about what parameters to try next. Here we show how the effects of the Gaussian process prior and the associated inference procedure can have a large impact on the success or failure of Bayesian optimization. We show that thoughtful choices can lead to results that exceed expert-level performance in tuning machine learning algorithms. We also describe new algorithms that take into account the variable cost (duration) of learning experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization on a diverse set of contemporary algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.Engineering and Applied Science
Online Meta-learning by Parallel Algorithm Competition
The efficiency of reinforcement learning algorithms depends critically on a
few meta-parameters that modulates the learning updates and the trade-off
between exploration and exploitation. The adaptation of the meta-parameters is
an open question in reinforcement learning, which arguably has become more of
an issue recently with the success of deep reinforcement learning in
high-dimensional state spaces. The long learning times in domains such as Atari
2600 video games makes it not feasible to perform comprehensive searches of
appropriate meta-parameter values. We propose the Online Meta-learning by
Parallel Algorithm Competition (OMPAC) method. In the OMPAC method, several
instances of a reinforcement learning algorithm are run in parallel with small
differences in the initial values of the meta-parameters. After a fixed number
of episodes, the instances are selected based on their performance in the task
at hand. Before continuing the learning, Gaussian noise is added to the
meta-parameters with a predefined probability. We validate the OMPAC method by
improving the state-of-the-art results in stochastic SZ-Tetris and in standard
Tetris with a smaller, 1010, board, by 31% and 84%, respectively, and
by improving the results for deep Sarsa() agents in three Atari 2600
games by 62% or more. The experiments also show the ability of the OMPAC method
to adapt the meta-parameters according to the learning progress in different
tasks.Comment: 15 pages, 10 figures. arXiv admin note: text overlap with
arXiv:1702.0311
Recommended from our members
Multi-Task Bayesian Optimization
Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up -fold cross-validation. Lastly, our most significant contribution is an adaptation of a recently proposed acquisition function, entropy search, to the cost-sensitive and multi-task settings. We demonstrate the utility of this new acquisition function by utilizing a small dataset in order to explore hyperparameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost.Engineering and Applied Science
- …